Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters

Poster Categories
Poster Schedule
Preparing your Poster - Information and Poster Size
How to mount your poster
Print your poster in Basel

View Posters By Category

Session A: (July 22 and July 23)
Session B: (July 24 and July 25)

Presentation Schedule for July 22, 6:00 pm – 8:00 pm

Presentation Schedule for July 23, 6:00 pm – 8:00 pm

Presentation Schedule for July 24, 6:00 pm – 8:00 pm

Session A Poster Set-up and Dismantle
Session A Posters set up: Monday, July 22 between 7:30 am - 10:00 am
Session A Posters should be removed at 8:00 pm, Tuesday, July 23.

Session B Poster Set-up and Dismantle
Session B Posters set up: Wednesday, July 24 between 7:30 am - 10:00 am
Session B Posters should be removed at 2:00 pm, Thursday, July 25.

E-01: Chromosome conformation analysis of ependymoma tumors identifies putative target genes activated by distal oncogenic enhancers
COSI: RegSys COSI
  • Konstantin Okonechnikov, German Cancer Research Center, Germany

Short Abstract: By profiling enhancers in primary ependymoma tumors, we have recently identified putative oncogenes and molecular targets. To associate enhancers with their likely target genes, we relied on co-activation analysis within publicly available topologically associated domains. Nevertheless, unambiguous identification of enhancer target genes remains to be a challenge in the absence of chromosome conformation information. Consequently, we have now used HiC technique to map the 3-dimensional organisation of tumor chromatin in the two most common and aggressive ependymoma subgroups: posterior fossa group A (PF-EPN-A) and supratentorial ependymomas with fusions involving the NF-κB subunit RELA (ST-EPN-RELA). By an integrative analysis of enhancer and gene expression in the context of the novel HiC data, we find that a large amount of the previously predicted enhancer targets can be confirmed by physical interactions. Importantly, we also identify many new putative tumor-dependency genes activated by long-range promoter-enhancer interactions. Complementary to the analysis of gene-enhancer interactions, we also leveraged the HiC data for resolving structural rearrangements underlying copy number alterations frequently observed in PF-EPN-A tumors. Our preliminary results reveal complex inter-chromosomal rearrangements, which affect genes that potentially contribute to poor survival thus could be perspective candidates for medical application.

E-02: A pipeline for single-cell spatial transcriptomic data analysis and visualization
COSI: RegSys COSI
  • Qian Zhu, Harvard University, United States
  • Ruben Dries, Harvard University, United States
  • Arpan Sarkar, Harvard University, United States
  • Feng Bao, Tsinghua University, China
  • Nico Pierson, California Institute of Technology, United States
  • Long Cai, California Institute of Technology, United States
  • Guo-Cheng Yuan, Harvard University, United States

Short Abstract: Sequential fluorescence in situ hybridization (seqFISH) is a powerful technology for spatial transcriptomic profiling at the single-cell resolution. However, computational methods for analyzing and visualizing seqFISH data are still lacking. To fill this gap, we have recently developed a hidden Markov random field (HMRF) model to systematically identify spatial domains with coherent gene expression patterns. We further integrated seqFISH and scRNAseq data to accurately map cell types. By combining cell-type and spatial-domain information, we were able to dissect the contributions of intrinsic cell-type specific gene expression signatures and location-dependent cell-cell interactions in mediating cellular heterogeneity. Here, we have generalized the previous work by developing an end-to-end pipeline for spatial transcriptomic data analysis and visualization, which can be used by biologists and bioinformatics experts alike. The pipeline takes raw images as input, carries out a series of analyses to extract molecular and morphological information and then further generate spatial maps of the cell state and local environment. The pipeline also contains a user-friendly, web-based portal to support interactive data visualization and exploration. In sum, this pipeline provides a much-needed tool for spatial transcriptomic analysis and visualization which in turn are important for understanding tissue organization and functions.

E-03: CBNA: A control theory based method for identifying coding and non-coding cancer drivers
COSI: RegSys COSI
  • Vu Viet Hoang Pham, University of South Australia, Australia
  • Lin Liu, University of South Australia, Australia
  • Cameron Bracken, Centre for Cancer Biology and The University of Adelaide, Australia
  • Gregory Goodall, Centre for Cancer Biology, Australia
  • Qi Long, University of Pennsylvania, United States
  • Jiuyong Li, University of South Australia, Australia
  • Thuc Le, University of South Australia, Australia

Short Abstract: A key task in cancer genomics research is to identify cancer driver genes. Although there are several methods developed to discover cancer drivers, most of them only identify coding drivers. However, non-coding RNAs can regulate driver mutations to develop cancer. Hence, novel methods are required to reveal both coding and non-coding cancer drivers. In this paper, we develop a novel framework named Controllability based Biological Network Analysis (CBNA) to uncover coding and non-coding cancer drivers. CBNA integrates different genomic data types, including gene expression, gene network, mutation data, and contains a two-stage process: (1) Building a condition-specific network (i.e. cancer condition) and (2) Identifying drivers. The application of CBNA to the BRCA dataset demonstrates that it is more effective than the existing methods in detecting coding cancer drivers. In addition, CBNA also predicts 18 non-coding drivers for breast cancer. Some of them have been validated by literature and the rest are good candidates for wet-lab validation. We further use CBNA to detect subtype-specific cancer drivers and several predicted drivers have been confirmed to be related to breast cancer subtypes.

E-04: TNBC.CMS: Consensus molecular subtype classifier for triple-negative breast cancer based on gene expression profile
COSI: RegSys COSI
  • Doyeong Yu, National Cancer Center, South Korea
  • Jihyun Kim, National Cancer Center, South Korea
  • Charny Park, National Cancer Center, Republic of Korea, South Korea

Short Abstract: Triple-negative breast cancer (TNBC) is the most heterogeneous subtype among breast cancer with an unfavorable prognosis. While there have been several attempts to identify molecular subtypes of TNBC based on gene expression profiles, signature optimization and classification performance have not been computationally validated yet. Here, we developed an R package TNBC.CMS to classify TNBC consensus molecular subtypes (CMS). TNBC.CMS embeds meta-data expression profile and classifier for users’ query profile. Additionally, our package provides functional investigation: pathway activity and clinical relevance as well as drug response. First, to establish embedded learning model, we collected gene expression profiles from 14 public microarray datasets (n = 957) and aggregated them after eliminating batch effect. The meta-dataset was clustered into four subtypes: mesenchymal-like, immunomodulatory, luminal-AR, and stem-like. Performance of CMS cluster was validated using gene signatures to elucidate biological features of subtypes. Next, we established CMS classifier based on machine-learning approach which assigns subtype for input expression profile. In further step, TNBC.CMS computationally predicts pathway activity and associated drug responses for each patient. Our analysis using gene signature was clinically validated in independent dataset. Finally, we expect that TNBC.CMS achieves precise TNBC patient diagnosis and anti-cancer therapy suggestion in clinical field.

E-05: Topological structure analysis of Hi-C interaction graphs
COSI: RegSys COSI
  • Juris Viksna, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Gatis Melkus, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Peteris Rucevskis, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Edgars Celms, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Karlis Cerans, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Paulis Kikusts, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Lelde Lace, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Martins Opmanis, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Darta Rituma, Institute of Mathematics and Computer Science, University of Latvia, Latvia

Short Abstract: Current Hi-C technologies for chromosome conformation capture allow to understand a broad spectrum of functional interactions between genome elements. Although significant progress has been made, there are still many open questions regarding best approaches for analysis of Hi-C data to identify biologically significant features. We present a novel method for analysis of Hi-C interaction networks with a goal to identify characteristic topological features of interaction graphs and to ascertain their potential significance in chromatin architecture. When applied to a large PCHi-C dataset generated by The Babraham Institute and University of Cambridge for 17 haematopoietic cell types, this method showed that interaction graphs decompose into comparatively small connected components, which can be either partially shared by all cell types, or can be pronouncedly cell type-specific – i.e. largely conserved in a specific set of cell types and practically absent in others. The components identified were analysed further in conjunction with other Hi-C data, including ensemble Hi-C and single-cell Hi-C datasets, to pinpoint topological features of particular significance and assist the inference of cell and promoter-specific patterns in high-throughput chromatin conformation capture data. Developed software components for visualisation and exploration of Hi-C interaction graphs are publicly available at GitHub: https://github.com/IMCSBioinformatics/HiCGraphAnalysis.

E-06: Sequence-level instructions direct transcription at polyT short tandem repeats
COSI: RegSys COSI
  • Choé Bessière, CNRS, France
  • Manu Saraswat, CNRS, France
  • Mathys Grapotte, CNRS, France
  • Christophe Menichelli, CNRS, France
  • Jessica Severin, RIKEN, Japan
  • Michiel de Hoon, RIKEN, Japan
  • Charles-Henri Lecellier, IGMM - Univ. Montpellier - CNRS, France
  • Laurent Brehelin, LIRMM - Univ. Montpellier - CNRS, France
  • Wyeth W. Wasserman, The University of British Columbia, Canada

Short Abstract: Using the Cap Analysis of Gene Expression technology, the FANTOM5 consortium provided one of the most comprehensive maps of TSSs in several species. Strikingly, most of them could not be assigned to a specific gene and/or initiate at unconventional regions, outside promoters or enhancers. Here, to determine whether these unconventional TSSs, sometimes referred to 'transcriptional noise' or 'junk', are relevant nonetheless, we look for novel and conserved regulatory motifs located in their vicinity. We show that, in all species studied, a significant fraction of CAGE peaks initiate at short tandem repeats (STRs) corresponding to homopolymers of thymidines. Biochemical and genetic evidence further demonstrate that several of these CAGEs correspond to TSSs of mostly sense and intronic non-coding RNAs, whose transcription rate can be predicted with ~81% accuracy by a sequence-based deep learning model. Excitingly, this model further reveals that genetic variants linked to human diseases affect this STR-associated transcription. Together, our results extend the repertoire of non-coding transcription and provides a valuable resource for future studies of complex traits.

E-07: Highly rearranged chromosomes reveal uncoupling between chromatin organization and gene expression
COSI: RegSys COSI
  • Yad Ghavi-Helm, European Molecular Biology Laboratory, Germany
  • Aleksander Jankowski, European Molecular Biology Laboratory, Germany
  • Sascha Meiers, European Molecular Biology Laboratory; Joint PhD degree from EMBL and Heidelberg University, Germany
  • Rebecca Rodríguez Viales, European Molecular Biology Laboratory, Germany
  • Jan O. Korbel, European Molecular Biology Laboratory, Germany
  • Eileen E.M. Furlong, European Molecular Biology Laboratory, Germany

Short Abstract: The three-dimensional organization of the genome is closely linked to regulation of gene expression. The activity of individual genes is orchestrated by DNA regulatory elements, often located at a large linear distance along the genome and brought into spatial proximity by intricate mechanisms driving genome topology. However, the functional consequences of genome topology remain unclear. Here, we systematically assessed the relationship between genome organization and gene expression using highly rearranged chromosomes (balancers) in Drosophila melanogaster. The rearranged chromosomes contain eight large inversions, thousands of smaller structural variants, and hundreds of thousands of single nucleotide variants. We assessed the impact of these genomic rearrangements on chromatin topology and gene expression in cis using a heterozygous cross, which minimizes the contribution of trans effects. We found that the rearrangements of balancer chromosomes affected genome topology at all scales, ranging from inter-TAD loops, through TAD boundaries, to intra-TAD interactions. Yet, surprisingly these changes were not predictive of changes in gene expression. Even around inversion breakpoints, where TADs were fused and long-range loops were disrupted, the expression of the majority of genes was not altered. Our results suggest that other factors, in addition to genome topology, determine the specificity and productivity of enhancer-promoter interactions.

E-08: Enhancer prediction in the human genome using supervised probabilistic modelling of epigenetic data
COSI: RegSys COSI
  • Maria Osmala, Aalto University, Finland
  • Harri Lähdesmäki, Aalto University, Finland

Short Abstract: The regulatory regions called enhancers are difficult to locate. Enhancers bind transcription factors and are occupied by nucleosomes with modified histones, features that are quantified by ChIP-seq assay. The ChIP-seq data is used as an input for unsupervised and supervised machine learning methods developed for enhancer prediction. However, the predictions made by different methods vary, they do not generalize between cell lines, and the choice of training data can affects the results. Moreover, the current methods do not utilize the shape of the ChIP-seq signal profiles efficiently. We have developed a classification tool for enhancer prediction. The shape of the data density around the positive and negative examples of enhancers is probabilistically modeled. The data originates from two ENCODE cell types. The predicted enhancers are computationally validated based on DNA-binding protein binding sites. We compare our enhancer predictions to those obtained by ChromHMM and RFECS. We study the effect of choosing the non-enhancer training data. Our method predicts genome-wide enhancers which are not identified by RFECS and ChromHMM, but which still validate as enhancers. The choice of training data can have a huge effect. The choice of different parameters affects the final results.

E-09: Dissecting Regulatory Networks at Single Cell Level with scHINT
COSI: RegSys COSI
  • Zhijian Li, Institute for Computational Genomics, RWTH Aachen University Medical School, Germany
  • Christoph Kuppe, Division of Nephrology and Clinical Immunology, RWTH Aachen University, Germany
  • Rafael Kramann, Division of Nephrology and Clinical Immunology, RWTH Aachen University, Germany
  • Ivan Costa, Institute for Computational Genomics, RWTH Aachen University, Germany

Short Abstract: We have recently proposed computational footprinting methods, which detect active binding sites of cells from open chromatin sequencing as DNase-seq and ATAC-seq. Among others, footprints can be used to detect the exact location of transcription factor binding sites (TFBSs) and changes in transcription factor activity during cell differentiation. The combination of open chromatin with single-cell sequencing (scATAC-seq) enables dissecting regulatory features of all cells from complex tissues, as well as to characterize regulatory changes during the onset of diseases. The sparseness of scATAC-seq is much higher than scRNA-seq due to the fact that the signal at a genomic loci is limited by DNA copy number and the number of regulatory features exceeds the number of genes. This imposes great methodological challenges in scATAC-seq analyses, such as classification of cell types, identification of regulatory elements and integration of data measured in distinct conditions: normal vs. disease. We, therefore, developed a novel method for quantification of open chromatin status of genomic regions (single cell HINT - scHINT), which mitigates the sparsity of scATAC-seq data.

E-10: A DNA methylation state transition model reveals the programmed epigenetic heterogeneity in pre-implantation embryos
COSI: RegSys COSI
  • Chengchen Zhao, School of Life Sciences and Technology, Tongji University, China
  • Naiqian Zhang, School of Life Sciences and Technology, Tongji University, China
  • Yong Zhang, School of Life Sciences and Technology, Tongji University, China

Short Abstract: Multiple researches have been carried out on the modeling of DNA methylation transition during mitosis. However, those approaches are not suited for modeling the DNA methylation transition across one cell cycle at single-cell resolution. Here we built a probabilistic model describing the changes of DNA methylation level across one cell cycle. The transition matrix of this model describes the changes of DNA methylation during one cell cycle in three steps: passive demethylation by DNA replication, active DNA methylation changes affected by DNA methylation-modifying enzymes and DNA methylation combinations during homologous recombination. This model includes three parameters, u, d and p, to represent the probabilities of three active DNA methylation change types. When applying this model in the early embryogenesis using public single-cell DNA methylome data, we found that the DNA methylation heterogeneity was largely programmed by the initial DNA methylation state in zygote. Furthermore, such programmed DNA methylation heterogeneity is relevant to the mRNA heterogeneity among cells within the same embryo before the first cell-fate determination. In summary, our study built a DNA methylation transition model to quantitate the active DNA methylation change probabilities, and revealed the programmed DNA methylation heterogeneity during early embryogenesis.

E-11: A DEEP LEARNING-BASED APPROACH TO DISSECTING FUNCTIONAL IMPACT OF DNA METHYLATION IN PLANTS
COSI: RegSys COSI
  • Ngoc Tu Le, Okinawa Institute of Science and Technology, Japan
  • Hidetoshi Saze, Okinawa Institute of Science and Technology, Japan

Short Abstract: DNA methylation plays a critical role in maintaining genome integrity and regulating plant adaptability to environmental stresses. Recent studies have shown that DNA methylation can confer functional impact on these processes by regulating the binding of transcription factors, and gene expression patterns thereof. Applications of experimental approaches to quantitatively measuring such impacts are hindered by technical difficulties. Here we present a data-driven approach to deciphering the impact of DNA methylation on transcription factor (TF) bindings in the plant model Arabidopsis thaliana. Exploiting publicly available data, we have trained hundreds of deep convolutional neural network (CNN) models to automatically learn methylation-sensitive binding motifs and quantitatively evaluate their potential impacts on TF bindings by in silico mutagenesis analysis. Importantly, our approach can be employed to predict TF bindings in contexts where experimental binding data have not been available or difficult to achieve. It, therefore, opens a way for large scale investigations of regulatory function of DNA methylation in plants under various developmental and environmental changes.

E-12: Gene network reconstruction using single cell transcriptomic data reveals key factors for autophagic process
COSI: RegSys COSI
  • Junil Kim, University of Copenhagen, Denmark
  • Kyoung Jae Won, University of Copenhagen, Denmark

Short Abstract: “What are the target genes?” is one of the frequently asked questions in biology to understand cellular processes. To answer this, experimental approaches including gain- and loss-of-function have been used. However, these approaches can catch indirect targets as well. Genome-wide analysis using chromatin immunoprecipitation followed by sequencing (ChIPseq) is useful but lmited by the availablity of the antibody. To understand transcriptional networks, we introduce an approach called TENET, a new algorithm to reconstruct gene regulatory networks (GRNs) from single cell RNAseq (scRNAseq) data. Based on transfer entropy, which measures the amount of directed transfer of information between two random processes, TENET identifies causal relationships between genes using the transcriptional profile aligned along the pseudo-time. Robust validation using ChIPseq and knockdown experiment found that the causal relationships predicted by TENET far better performed other algorithms such as SCODE. Applying TENET to the scRNAseq data obtained during autophagic process identified a key transcription factor (TF) factor X that potentially controls majority of autophagy related genes. Knocking down of this factor X followed by green fluorescent protein (GFP)-tagged LC3-positive autophagosome further confirmed our results. Our results show that TENET identifies master regulators from single cell transcriptomic data.

E-13: Combining transcriptional and post-transcriptional regulation to predict mutations altering the gene regulatory program in cancer cells
COSI: RegSys COSI
  • Jaime A Castro-Mondragon, Centre for Molecular Medicine Norway, Norway
  • Miriam Ragle Aure, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Norway
  • Vessela N Kristensen, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Norway
  • Anthony Mathelier, NCMM, University of Oslo, Norway

Short Abstract: MiRNAs are involved in gene regulation by inhibiting mRNA translation and a single miRNA sequence may regulate hundreds of mRNAs. With miRNAs known to be involved in cancer initiation and progression, a better understanding of miRNA transcriptional regulation and its disruption in cancer is clearly required. By combining TFBSs and miRNA TSSs information with cancer patient data, we evaluated the combined effects of transcriptional and post-transcriptional dysregulation of gene expression with the alteration of miRNA regulation in cancer through cis-regulatory alterations. The analyses culminated with the identification of mutations at TFBSs affecting the expression of key protein-coding and miRNA genes with a cascading dysregulating effect of the cells’ regulatory program. Our predictions were enriched for protein-coding and miRNA genes previously annotated as potential cancer drivers. Functional enrichment analyses highlighted the dysregulation of key pathways associated with carcinogenesis. These results confirm that our method predicts cis-regulatory mutations related to the dysregulation of key gene regulatory networks in cancer patients. This new strategy represents an original methodology to decipher how the gene regulatory program is disrupted in cancer cells by combining transcriptional and post-transcriptional regulation of gene expression.

E-14: Gene expression signatures of cell death and proliferation - from confounding factors to biological insight and effective prediction
COSI: RegSys COSI
  • Bence Szalai, Semmelweis University, Budapest, Hungary, Hungary
  • Julio Saez-Rodriguez, Institute of Computational Biomedicine, Heidelberg University, Germany

Short Abstract: Large scale perturbation gene expression measurements are valuable data sources for functional genomic studies. Derived gene expression signatures can be used to infer compound mechanism of action or functional activities of different cellular processes. Linking perturbation signatures to phenotypic studies opens up the possibility to model cellular phenotypes from gene expression and to predict drugs interfering with the phenotype. In this study we linked cell viability phenotypic information upon genetic (Achilles screen) and drug (CTRP screen) perturbations with the corresponding gene expression signatures from LINCS-L1000 screen for more than 90,000 pairs. Using this dataset we show that cell viability is a major factor behind gene expression signatures. Analysing the cell viability associated signature revealed transcription factors regulating cell death and proliferation. We used the gene expression - cell viability relationship to predict cell viability in the whole LINCS-L1000 dataset (>500,000 different perturbations), and revealed several compounds with cancer cell line specific toxic effect. We also show that cellular toxicity can lead to unexpected similarity of signatures, confounding mechanism of action discovery. Our results can help understanding mechanisms behind cell death and proliferation, removing confounding factors of transcriptomics perturbation screens and show that expression signatures boost prediction of drug sensitivity.

E-15: Splotch: robust estimation of spatial gene expression and alignment of spatial and single-cell transcriptomic studies.
COSI: RegSys COSI
  • Tarmo Äijö, Flatiron Institute / CCB, United States
  • Richard Bonneau, Center for Data Science, New York University, New York, NY, USA, United States

Short Abstract: Spatial genomics technologies enable new approaches to study cells interacting and functioning in intact multicellular environments, but present technical and computational challenges including: alignment of multiple experiments into common spatial coordinates, dealing with low or uneven sampling and zero inflation, and integrating spatial data with very large corpora of relevant non-spatial data. We describe Splotch, a novel and extensible computational framework for the analysis of spatial genomics data that aligns multiple timepoints and tissue sections into a common spatial-temporal coordinate that is then used to generate improved posterior estimates of gene expression. Splotch relies on a flexible Bayesian quantification of uncertainty at all stages of the model and can be extended to many overall experimental designs. We incorporate anatomical regions, genotype and other experimental design parameters in a single computation. We give guidelines for the optimal design of ST experiments and demonstrate alignment of a large corpus of single-cell (SC) data into a spatial-temporal coordinate automatically generated from spatial transcriptomics data. We illustrate this method for the aligning single-cell RNA-seq data into an automatically generated common coordinate by applying and apply the method to the mouse spinal column and olfactory bulb SC and ST data-sets.

E-16: TFmotifView: a webserver for the visualization of transcription factor motifs in genomic regions
COSI: RegSys COSI
  • Anaïs Bardet, CNRS UMR7242 - University of Strasbourg, France

Short Abstract: In the recent years, the binding specificities of many TFs have been deciphered and summarised as positions weight matrices also called TF motifs. Despite the availability of thousands of known TF motifs in databases, it remains non-trivial to quickly query and visualise the enrichment of known TF motifs in genomic regions of interest. Towards this goal, we have developed TFmotifView, a web server that allows to study the distribution of known transcription factor (TF) motifs in genomic regions of interest. Based on input genomic regions and selected TF motifs, TFmotifView performs an overlap of the genomic regions with TF motif occurrences to generates three different outputs: (1) an enrichment table and scatterplot calculating the significance of TF motif occurrences in genomic regions compared to control regions, (2) a genomic view of the organisation of TF motifs in each genomic region, (3) a metaplot summarising the position of TF motifs relative to the center of the regions.

E-17: Pan-cancer identification of transcription factors associated with aberrant DNA methylation patterns
COSI: RegSys COSI
  • Roza Berhanu Lemma, Center for Molecular Medicine Norway (NCMM), University of Oslo, Norway
  • Anthony Mathelier, NCMM, University of Oslo, Norway

Short Abstract: Methylation of CpGs at promoters and enhancers represents a major epigenetic DNA modification involved in transcriptional regulation. Aberrant DNA methylation patterns have recurrently been associated with dysregulation of the regulatory program in cancer cells. By combining DNA methylation arrays and gene expression data from TCGA with transcription factor (TF) binding sites, we explored the interplay between TF binding and DNA methylation in cancers. We hypothesized that aberrant methylation patterns could be triggered by binding of specific TFs. This was assessed by studying the correlation between the level of expression of TFs with the level of methylation at their binding regions. Specifically, for each TF, we performed expression-methylation quantitative trait loci computations and estimated the proportion of CpGs in the TF binding regions with methylation level correlated with the TF’s expression. The TFs with the highest proportion of correlated CpGs methylation are most likely to be associated with aberrant DNA methylation patterns. We identified 18 TFs as outliers, with high correlation between expression and demethylation at CpGs close to their binding sites. These TFs were significantly enriched for pioneering function, suggesting a special role for these pioneer TFs in modulating the chromatin structure and thereby the transcriptional profile in cancer patients.

E-18: Comprehensive pipeline for processing, deconvolution and visualization of complex DNA methylation data
COSI: RegSys COSI
  • Shashwat Sahay, Saarland University, Germany
  • Tony Kaoma, Luxembourg Institute of Health, Luxembourg
  • Valentin Maurer, University of Heidelberg, Germany
  • Francisco Azuaje, Luxembourg Institute of Health, Luxembourg
  • Joern Walter, Saarland University, Germany
  • Pavlo Lutsik, German cancer research center, Germany
  • Reka Toth, German cancer research center, Germany
  • Petr Nazarov, Luxembourg Institute of Health, Luxembourg
  • Michael Scherer, Max-Planck Institute for Informatics, Germany

Short Abstract: DNA methylation patterns are cell-type specific and key factors determining cellular identity. Thus, reference DNA methylation profiles can be used to infer cellular composition of bulk tissue samples. However, these reference profiles are hard to obtain and not always well-defined. Reference-free methods for estimating cellular composition aim to infer both the proportions and the methylomes of underlying cell types. MeDeCom is an R- package for the reference-free deconvolution of complex methylomes. However, input data cannot contain unreliable or otherwise problematic sites, and a pre-selection of potentially informative sites is to be performed. To alleviate this, we propose a new processing package (DecompPipeline) for reference-free deconvolution algorithms, which is not limited to MeDeCom. Furthermore, we present a substantially revised visualization tool (FactorViz) to explore MeDeCom's results. FactorViz aids users in understanding the estimated reference profiles. We applied these packages to a lung adenocarcinoma data set and found indications of tumor infiltrating immune cells. The analysis of a whole blood data set revealed substantial heterogeneity across the samples, highlighting the risk of using whole blood as surrogate tissue in epigenome-wide association studies. DecompPipeline, MeDeCom, and FactorViz offer a comprehensive pipeline for routinely dissecting heterogeneous samples in epigenomic studies.

E-19: Characterizing cell-type specific gene regulation from single-cell transcriptome profiling
COSI: RegSys COSI
  • Padvitski Tsimafei, CECAD, University of Cologne, Belarus
  • Andreas Beyer, Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), Germany

Short Abstract: Understanding principles of gene regulation is pivotal for engineering biological systems. Single-cell transcriptomics (scRNA-seq) allows to profile thousands of cells in one experiment, giving us power to computationally infer principles of gene regulation on a new level of granularity. In this study we applied regularized regression to published scRNA-seq data from more than 100 different mouse cell-types in order to infer cell-type specific gene-regulatory networks (GRN). Data from different scRNA-seq technologies lead to distinct GRN structures, suggesting that technology has a major influence on the networks. However, we also found that networks from the same cell-types were more similar than expected by chance. Hence, the single cell data enables the detection of true biological signal. Further, this analysis enabled us to distinguish gene-gene interactions that were invariant across cell-types from cell-type specific interactions. Our work lays the foundation for understanding principles of GRN adaptations.

E-20: Cell type and brain region-specific chromatin abnormalities in dopamine depleted mice
COSI: RegSys COSI
  • Alyssa Lawler, Carnegie Mellon University, United States
  • Ashley R. Brown, Carnegie Mellon University, United States
  • Rachel S. Bouchard, Carnegie Mellon University, United States
  • Irene M. Kaplow, Carnegie Mellon University, United States
  • Noelle Toong, Carnegie Mellon University, United States
  • Chaitanya Srinivasan, Carnegie Mellon University, United States
  • Yeonju Kim, Carnegie Mellon University, United States
  • Naomi Shin, Carnegie Mellon University, United States
  • Aryn H. Gittis, Carnegie Mellon University, United States
  • Andreas R. Pfenning, Carnegie Mellon University, United States

Short Abstract: Neuron subtype dysfunction is a key contributor to the motor deficits observed in dopamine depleted mouse models of Parkinson's Disease. Specifically, parvalbumin-expressing (PV+) neurons in the external globus pallidus (GPe) spike less frequently in the dopamine depleted state and cell type-specific optogenetic stimulation of GPe PV+ neurons—but not indiscriminate GPe stimulation—rescues normal motor behavior [1]. Yet, the molecular properties underlying these electrophysiological changes remain unknown. We apply affinity purification to isolate PV+ and PV- nuclei from three brain regions of healthy and dopamine depleted mice for targeted epigenetic assessment: the GPe, striatum, and isocortex. Using the Assay for Transposase-Accessible Chromatin (ATAC-seq), we identify regional changes in open chromatin in PV+ and PV- cell types after dopamine depletion. Additionally, we characterize region and cell type-specific open chromatin sites containing Parkinson’s Disease-associated variants, connecting potential disease mechanisms through genetic predisposition to gene regulation to pathophysiology. These results provide new insight into the molecular progression of Parkinson’s Disease at the resolution of individual cell types and tissues. Moreover, they initiate new candidates for gene therapy targets in patients. 1. Mastro, K. J. et al. Cell-Specific Pallidal Intervention Induces Long-Lasting Motor Recovery in Dopamine Depleted Mice. Nat. Neurosci. 20, 815–823 (2017).

E-21: Differential analysis of transcription factor activity from gene expression
COSI: RegSys COSI
  • Viren Amin, Baebies Inc, United States
  • Didem Agac, MD Anderson Cancer Center, United States
  • Spencer Barnes, UT Southwestern Medical Center, United States
  • Murat Can Cobanoglu, UT Southwestern Medical Center, United States

Short Abstract: We present EPEE (Effector and Perturbation Estimation Engine), a method for differential analysis of transcription factor (TF) activity from gene expression data. EPEE addresses a number of critical challenges unmet with existing approaches. Firstly, EPEE collectively models all TF activity in a single multivariate model. This enables accounting for the intrinsic coupling among TFs that share targets, which is highly frequent. Secondly, EPEE incorporates context-specific TF-gene regulatory networks and therefore adapts the analysis to each biological context. Finally, EPEE can flexibly reflect different regulatory activity of a single TF among its potential targets. This allows the flexibility to implicitly recover other regulatory influences such as co-activators or repressors. We show that addressing the aforementioned challenges enable EPEE to outperform alternative methods and reliably produce accurate results.

E-22: FORGE2: cell type-specific signal analysis for GWAS across multiple epigenomic datasets
COSI: RegSys COSI
  • Charles Breeze, Altius Institute for Biomedical Sciences, United States
  • Alex Reynolds, Altius Institute for Biomedical Sciences, United States

Short Abstract: GWAS SNPs are known to co-locate to active regulatory elements in tissues and cell types relevant to disease aetiology. Further characterisation of cell type-specific signal for GWAS SNPs is anticipated to broaden our understanding of GWAS-mediated disease pathology. The FORGE tool, among others, initially automated this approach. However, FORGE only focused on DHSs, and a remarkable reservoir of regulatory marks, including histone marks, remain underanalysed by this approach. We took data for 5 histone marks (H3K4me1, H3K4me3, H3K36me3, H3K27me3 and H3K9me3) across 39 cell types and tissues and analysed the entire GWAS catalogue for cell type-specific signal applying BY correction with a false positive rate of >0.001%. Further, we also designed a new analysis approach using TF motif data, implemented as the FORGE2-TF tool (https://forge2-tf.altiusinstitute.org/). We identified 15 previously undetected tissue/cell type-disease associations, covering different classes including additional tissues, new tissues, higher enrichment or different tissues. Diseases for which new associations were detected range from autoimmune diseases such as psoriasis to psychiatric conditions such as ADHD. In addition, tissue-specific enrichments for the repressive H3K27me3 mark were identified, suggesting a role for polycomb-repressed regions in GWAS-mediated disease aetiology. Thus, FORGE2 analyses offer insights into the regulatory mechanisms underlying GWAS SNP action.

E-23: Epigenomics and Single-Cell Sequencing Define Cellular Heterogeneity in Langerhans Cell Histiocytosis
COSI: RegSys COSI
  • Florian Halbritter, Children's Cancer Research Institute, St. Anna Kinderkrebsforschung; CeMM Research Center for Molecular Medicine, Austria
  • Matthias Farlik, CeMM Research Center for Molecular Medicine; Department of Dermatology, Medical University of Vienna, Austria
  • Raphaela Schwentner, St. Anna Kinderkrebsforschung, Children's Cancer Research Institute, Austria
  • Thomas Schnoeller, St. Anna Kinderkrebsforschung, Children's Cancer Research Institute, Austria
  • Nikolaus Fortelny, CeMM Research Center for Molecular Medicine, Austria
  • Caroline Hutter, St. Anna Kinderkrebsforschung, Children's Cancer Research Institute; St. Anna Children's Hospital, Austria
  • Hanja Pisa, Children's Cancer Research Institute (CCRI), Austria
  • Christoph Bock, CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Austria

Short Abstract: Langerhans Cell Histiocytosis (LCH) is a neoplastic disease characterized by the accumulation of CD1a+CD207+ cells of unknown origin in affected tissues. Constitutive activation of the ERK signaling pathway is a common feature of LCH, but the exact pathogenesis remains unclear. Here we present a comprehensive analysis of the composition of LCH lesions in order to gain insight into the development and molecular mechanisms of this disease. We used a multi-layered approach employing immunohistochemistry, flow cytometry, single-cell RNA-sequencing, and ATAC-sequencing of patient biopsies, combined with integrative computational analysis. Based on these data, we identified multiple cell subsets within LCH lesions, including a proliferative progenitor state. Further analysis suggests a hierarchy between these cells with a directed developmental program in each tumor. Finally, chromatin mapping inferred regulatory signatures discriminating these subsets. Taken together, our study shows at single-cell resolution that there is considerable intra-tumor heterogeneity in LCH and provides insight into the molecular framework underlying tumor development. Our transcriptomic and epigenomic analyses have generated a reference map of tumor cells. We propose that lesions are comprised of different LCH cell subsets that arise in an intrinsic developmental process. This work can serve as a template for the analysis of tumorigenesis beyond LCH.

E-24: Cross-species Analysis Reveal Regulatory Loci Underlying Limb Diversity
COSI: RegSys COSI
  • Shalu Jhanwar, University of Basel - Department of Biomedicine, Switzerland
  • Jonas Malkmus, University of basel - Department of Biomedicine, Switzerland
  • Jens Stolte, University of Basel - Department of Biomedicine, Switzerland
  • Aimee Zuniga, University of Basel - Department of Biomedicine, Switzerland
  • Rolf Zeller, University of Basel - Department of Biomedicine, Switzerland

Short Abstract: Understanding the origin of morphological diversity across vertebrates is central to evolutionary developmental biology. Mouse and chicken share conserved transcriptome yet possess distinct limb morphology. With the aim to determine cis-regulatory elements (REs) underlying limb diversity, we propose a comprehensive approach integrating next-generation sequencing and genetics within a comparative framework. We hypothesize that the phenotypic differences in forelimbs of mouse and chicken arise due to two reasons. Firstly, rapidly evolving REs called chicken accelerated regions (CARs) might explain the divergence in limb morphology of chicken. In total, we found 300 CARs by first identifying conserved enhancers across species using PhastCons algorithm, followed by PhyloP to check for accelerated nucleotide substitutions in the chicken genome. Secondly, differential chromatin accessibility of REs between two species might explain the phenotypic differences. To this end, we performed ATAC-seq and RNA-seq of the developing limb. Using an unbiased clustering approach, we identified six temporal motifs of chromatin accessibility and expression in both species. Interestingly, we found differentially enriched transcription factor (TF) binding motifs of key TFs across accessibility modules. Finally, we intend to construct regulatory networks involving key players of limb development that may shed light into the developmental stage- and species-specific transcriptional regulation.

E-25: Identification and categorization of rare single base variations in scRNA-Seq data
COSI: RegSys COSI
  • Dena Leshkowitz, Weizmann Institute of Science, Israel
  • Refael Kohen, Weizmann Institute, Israel

Short Abstract: We use 10X chromium scRNA-Seq (single cell RNA-Seq) data in order to identify single nucleotide variation that can originate from RNA editing and complement the gene expression information. In our tool, we screen aligned sequences (bam files), for a variation in an exonic genomic location. A transcript is identified by the UMI (Unique Molecular Identifier) and the cell barcode. We analyze the individual cell barcode and classify three types of UMIs. UMIs may exhibit the same variation in either all, some or none of the reads and are termed “complete", "partial" or "none", respectively. The information on the number of UMIs per class and their read count, is reported for each variation detected. This approach allows us to distinguish between random sequencing or PCR errors (that will appear as a “partial” conversions), from true variations, that can be a result of RNA editing events such as A-to-I, that upon sequencing are evident as A-to-G.

E-26: UTAP: User-friendly Transcriptome Analysis Pipeline
COSI: RegSys COSI
  • Dena Leshkowitz, Weizmann Institute of Science, Israel
  • Refael Kohen, Weizmann Institute, Israel
  • Jonathan Barlev, Weizmann Institute of Science, Israel
  • Gil Hornung, Weizmann Institute of Science, Israel
  • Gil Stelzer, Weizmann Institute of Science, Israel
  • Ester Feldmesser, Weizmann Institute of Science, Israel
  • Kiril Kogan, Weizmann Institute of Science, Israel
  • Marilyn Safran, Weizmann Institute of Science, Israel

Short Abstract: Background RNA-Seq technology is routinely used to characterize the transcriptome, and to detect gene expression differences among cell types, genotypes and conditions. Advances in short-read sequencing instruments such as Illumina Next-Seq have yielded easy-to-operate machines, with high throughput, at a lower price per base. However, processing this data requires bioinformatics expertise to tailor and execute specific solutions for each type of library preparation. Results In order to enable fast and user-friendly data analysis, we developed an intuitive and scalable transcriptome pipeline that executes the full process, starting from cDNA sequences derived by RNA-Seq and bulk MARS-Seq and ending with sets of differentially expressed genes. Output files are placed in structured folders, and results summaries are provided in rich and comprehensive reports, containing dozens of plots, tables and links. Conclusion Our User-friendly Transcriptome Analysis Pipeline (UTAP) is an open source, web-based intuitive platform available to the biomedical research community, enabling researchers to efficiently and accurately analyse transcriptome sequence data. Availability Information about where to download the UTAP Docker application can be found at https://utap.readthedocs.io Published work Kohen et al. BMC Bioinformatics (2019) 20:154

E-27: CRUP: A comprehensive framework to predict condition-specific regulatory units
COSI: RegSys COSI
  • Verena Heinrich, Max Planck Institute for Molecular Genetics, Germany
  • Anna Ramisch, Max Planck Institute for Molecular Genetics, Germany
  • Martin Vingron, Max Planck Institute for Molecular Genetics, Germany

Short Abstract: We present the software CRUP (Condition-specific Regulatory Units Prediction) to infer from epigenetic marks a list of regulatory units consisting of dynamically changing enhancers with their target genes. The three-step workflow consists of a novel pre-trained enhancer predictor that can be reliably applied across tissues and species, solely based on three histone modification ChIP-seq data. Enhancers are subsequently assigned to different conditions and correlated with gene expression within the same topologically associated domain to derive regulatory units. We thoroughly test and then apply CRUP to a rheumatoid arthritis model, identifying enhancer-gene pairs comprising known disease genes as well as new candidate genes.

E-28: ReMap 2020: Update of the regulatory regions atlas from an integrative analysis of ChIP-seq for human and plants experiments.
COSI: RegSys COSI
  • Allyssa Douida, Université Aix-Marseille, France
  • Adele Simler, Université Aix-Marseille, France
  • Wassim Rhalloussi, Université Aix-Marseille, France
  • Martin Mestdagh, Université Aix-Marseille, France
  • Zacharie Ménétrier, Université Aix-Marseille, France
  • Thomas Rosnet, INSERM, France
  • Aurélie Bergon, INSERM, France
  • Fabrice Lopez, INSERM, France
  • Lionel Spinelli, INSERM, France
  • Benoît Ballester, INSERM, France
  • Jeanne Chèneby, Aix Marseille Univ, INSERM, UMR U1090, TAGC, Marseille, France, France

Short Abstract: Transcription Regulators (TR) are composed of transcription factors, transcriptional coactivators and chromatin-remodeling factors, they drive gene transcription and the organization of chromatin through DNA binding. The development and popularisation of ChIP-seq technique, as led to an exponential increase in TR occupancy datasets in public databases. Large scale integrative studies of such data offers significant insights into the mechanisms by which a TR selects its binding regions in each cellular environment. The integration processes and the underlying detection of TF binding sites are challenging because of the heterogeneity of database format and pipelines used to process ChIP-seq data. Quality data integration therefore requires manual uniform annotation and reprocessing of the raw ChIP-seq data. ReMap 2015 has been the first large scale integrative initiative in which ChIP-seq were reprocessed and quality controlled, offering significant insights into the complexity of the human regulatory landscape. In 2018 an update was made with new data. We are currently working on the ReMap 2020 update with a new pipeline assuring analyses reproducibility and pipeline portability in processing ChIP-seq data. We went from 400 ChIP-seq experiments in 2015 to an expected 6700 for the ReMap 2020 update.

E-29: Non-optimal codon usage effects protein expression levels as means of cell-cycle regulation
COSI: RegSys COSI
  • Mahua Bhattacharya, Bar Ilan University, Israel
  • Dorith Raviv Shay, The Azrieli Faculty of Medicine, Bar-Ilan University, Israel
  • Milana Frenkel-Morgenstern, The Azrieli Faculty of Medicine, Bar-Ilan University, Israel

Short Abstract: Protein expression is dependent on optimal codon usage by pairing tRNA molecule with a corresponded codon. Since codons have different affinities for anti-codon, they are translated with different efficiencies. A study shows that cell-cycle regulated genes are biased towards the low-affinity codons. Since the translation process is dependent on abundance of tRNA and ATP, decrease in their concentration during the cell-cycle phases could result in a lower translation rate for genes encoded by low-affinity codons. That observation has been made in-silico for human genes, which led us to hypothesise that protein encoded with low-affinity codons will result in oscillations in the protein dynamics patterns during the cell-cycle, while similar proteins encoded with high-affinity codons will present constant protein dynamics throughout the different cell-cycle phases. We studied this by expressing proteins with different codons and studied the levels of proteins in different phases of cell cycle. Our findings suggest that there is a difference in the protein dynamics patterns of proteins encoded by the low- and high-affinity codons, where a protein encoded with mainly low-affinity codons presented oscillations during the cell-cycle phases, while a similar protein encoded with high-affinity codons presented a constant patterns.

E-30: Integration of multi-omics data elucidates the genetic control of cellular signaling
COSI: RegSys COSI
  • Jan Grossbach, Systems Biology, CECAD, University of Cologne, Germany
  • Ludovic Gillet, Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
  • Mathieu Clément-Ziza, Lesaffre, Marcq-en-Baroeul, France, France
  • Corinna Lewis Schmalohr, Systems Biology, CECAD, University of Cologne, Germany
  • Olga Schubert, Department of Human Genetics, University of California, Los Angeles, United States
  • Christopher Barnes, Novo Nordisk Research Center Seattle, Inc., Seattle, WA, United States
  • Isabell Bludau, Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
  • Ruedi Aebersold, Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
  • Andreas Beyer, Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), Germany

Short Abstract: Genomic variation affects cellular networks by altering diverse molecular layers such as RNA levels, protein abundance, and post-translational protein modifications. However, how these different layers are affected by genetic polymorphisms and give rise to complex physiological phenotypes remains unclear. To address these questions, we generated high-quality transcriptome, proteome, and phosphoproteome data for a panel of 112 genetically diverse yeast strains. We found that, while genetic effects on transcript abundances are usually transmitted to the protein level, there was a significant uncoupling of the transcript-protein relationship for certain protein classes. One example are subunits of protein complexes that the cell maintains at levels that reflect their stoichiometries in the complex. By integrating our phosphoproteomics data with the transcript- and protein abundances we found that the same genetic locus often affected distinct cellular sub-networks within each of these layers. Furthermore, the number of protein phosphosites associated with a given locus was more predictive for its influence on cellular growth traits than any other molecular layer that we investigated. This study paves the ground for a better understanding of how multi-layered molecular networks mediate the effects of genomic variants to complex physiological traits.

E-31: Analysis of the structural variability of topologically associated domains as revealed by Hi-C
COSI: RegSys COSI
  • Natalie Sauerwald, Carnegie Mellon University, United States
  • Akshat Singhal, Stony Brook University, United States
  • Carl Kingsford, Carnegie Mellon University, United States

Short Abstract: Three-dimensional chromosome structure plays an integral role in gene expression and regulation, replication timing, and other cellular processes. Topologically associating domains (TADs), building blocks of chromosome structure, are genomic regions with higher contact frequencies within the region than outside the region. A central question is the degree to which TADs are conserved or vary between conditions. We analyze a set of 137 Hi-C samples from 9 different studies under 3 measures in order to quantify the effects of various sources of biological and experimental variation. We observe significant variation in TAD sets between both non-replicate and replicate samples, and study variability across tissue samples and parent-parent-child trios. We also find that samples can have protocol-specific structural changes, but that TADs are generally robust to lab-specific differences. This study represents a systematic quantification of the key factors influencing comparisons of chromosome structure, suggesting significant variability and the potential for cell-type-specific structural features, which has yet to be explored. The lack of observed influence of heredity and individual genetic differences on TADs suggests that we should look to factors other than the genetic sequence for the drivers of this structure, which plays such an important role in human disease and cellular functioning.

E-32: Topological data analysis reveals principles of chromosome structure throughout cellular differentiation
COSI: RegSys COSI
  • Natalie Sauerwald, Carnegie Mellon University, United States
  • Yihang Shen, Carnegie Mellon University, United States
  • Carl Kingsford, Carnegie Mellon University, United States

Short Abstract: Three-dimensional chromosome structure has a significant influence in many diverse genomic processes and has recently been shown to relate to cellular differentiation. Many methods for describing the chromosomal architecture focus on specific substructures such as topologically-associating domains (TADs) or compartments, but we are still missing a global view of all geometric features of chromosomes. Topological data analysis (TDA) is a mathematically well-founded set of methods to derive robust information about the structure and topology of data sets, making it well-suited to better understand the key features of chromosome structure. By applying TDA to the study of chromosome structure through differentiation across three cell lines, we provide insight into principles of chromosome folding generally, and observe structural changes across lineages. We identify both global and local differences in chromosome topology through differentiation, identifying trends consistent across three human cell lines.

E-33: Unfolding the non-coding RNA landscape in pluripotency and differentiation
COSI: RegSys COSI
  • Rina Ben-El, Technion Institute of Technology, Israel
  • Yael Mandel, Technion Institute of Technology, Israel
  • Kashish Chetal, Cinncinati Children’s Hospital, United States
  • Nathan Salomonis, Cinncinati Children’s Hospital, United States

Short Abstract: Human Pluripotent Stem Cells (PSCs) are characterized by their self-renewal capacity and their ability to differentiate into every cell type in the body. Deciphering the molecular factors that are involved in stem cell fate decision is crucial to understanding human development. Transcription Factors (TFs) are known to play key roles in maintaining the stem cell state and in differentiation to each of the three germ line layers. In recent years, long intergenic non-coding RNAs (lincRNAs) have emerged as regulators of pluripotency and differentiation. To study the role of lincRNAs in stem cell fate, we generated RNA-seq data from eleven time points during directed differentiation to cardiomyocytes. We clustered the lincRNAs with known markers of pluripotency and differentiation which revealed over 500 differentially expressed non-coding RNAs, putatively involved in different steps of cardiac differentiation. Furthermore, we also explored the expression of alternatively spliced isoforms of TFs and lincRNAs throughout the experiment. We discovered dozens of TFs and lincRNA genes, which their alternative transcript shows a distinct pattern of expression during pluripotency and differentiation, suggesting an important role for alternative RNA processing in differentiation. Together this suggests a novel role for lincRNAs in stem cell fate decision.

E-34: Cell-ID: Cell identity prediction and gene signature extraction at single-cell resolution
COSI: RegSys COSI
  • Akira Cortal, Clinical Bioinformatics Lab, Institut Imagine, France
  • Antonio Rausell, Clinical Bioinformatics Lab, Institut Imagine - INSERM UMR-1163, France

Short Abstract: Low-dimensional representation of single-cell RNA-seq data -using e.g. Principal Component Analysis, Independent Component Analysis or t-SNE-, is a common procedure for the identification of subpopulations of cells. However, standard techniques are limited to the analysis of cell similarities, neglecting the intrinsically associated low-dimensional representation of genes. Here we show that Multiple Correspondence Analysis can be adapted to produce a simultaneous representation of cells and genes within the same Euclidean space, where distances between both entities are assessed in a robust way. Genes may then be ranked for each cell to perform automatic cell identity and cell state prediction through gene set enrichment analysis. Moreover, per-cell gene rankings provide unbiased gene signatures for each cell, which proved valuable to estimate cell similarities across independent datasets and overcomed batch effects arising from different technologies, tissues-of-origin and donors. The strategy is a main conceptual novelty in the field, as the assessment of cell type identification is done for each individual cell rather than on the basis of a pre-clustering step. This is a major advantage for the identification of rare or even unique cells with a potential role in disease. The approach is implemented as an R package with a user-friendly shiny interface.

E-35: Combining single-cell genomics with lineage tracing to study the lineage plasticity of vascular smooth muscle cells
COSI: RegSys COSI
  • Lina Dobnikar, University of Cambridge, Babraham Institute, United Kingdom

Short Abstract: Vascular smooth muscle cells (VSMCs) possess a remarkable capacity to change phenotype. They they downregulate the contractile differentiation markers and increase migration, proliferation and secretion of proinflammatory cytokines in response to injury or inflammation. This process in misregulated in atherosclerosis. Only a small subset of cells respond in mouse models of atherosclerosis (Chappell et al, Circulation Research 2016), suggesting that VSMCs in healthy vessels may be functionally heterogeneous. Here, we combined single-cell RNA-seq and lineage tracing to profile gene expression in individual VSMCs from healthy mouse vessels. We found that VSMCs are heterogeneous with respect to several functionally relevant genes, including the progenitor marker Sca1. Sca1+ cells in healthy vessels are by themselves heterogeneous, with many of them devoid of expression of classic VSMC markers. Lineage tracing enabled us to confirm the VSMC-lineage identity of Sca1+ cells and identify a similar Sca1+ VSMC-derived subpopulation in atherosclerotic plaques. We confirmed that Sca1 upregulation marks VSMCs in the process of phenotypic switching in vitro and in vivo. Together, these analyses enabled the identification of a clinically relevant subset of VSMCs, and highlighted the power of combining single-cell transcriptomics with lineage tracing for studying phenotypically plastic cell populations in healthy tissue and disease.

E-36: Predicting enhancers in mammalian genomes using supervised hidden Markov models
COSI: RegSys COSI
  • Martin Vingron, Max Planck Institute for Molecular Genetics, Germany
  • Tobias Zehnder, Max Planck Institute for Molecular Genetics, Germany
  • Philipp Benner, Max Planck Institute for Molecular Genetics, Germany

Short Abstract: In order to predict the location of transcriptional regulators such as enhancers and promoters, many existing methods use the information on chromatin accessibility or histone modifications to train classifiers in order to segment the genome into functional groups such as enhancers and promoters. However, these methods often do not consider prior biological knowledge about enhancers such as their diverse lengths or molecular structure. We developed enhancer HMM (eHMM), a supervised hidden Markov model designed to learn the molecular structure of promoters and enhancers. Both consist of a central stretch of accessible DNA flanked by nucleosomes with distinct histone modification patterns. We evaluated the performance of eHMM within and across cell types and developmental stages and found that eHMM successfully predicts enhancers with high precision and recall comparable to state-of-the-art methods, and consistently outperforms those in terms of accuracy and resolution. In comparison to other ’black box’ methods eHMM's parameters are easy to interpret. eHMM can be used as a stand-alone tool for enhancer prediction without the need for additional training or a tuning of parameters. The high spatial precision of enhancer predictions gives valuable targets for potential knockout experiments or downstream analyses such as motif search.

E-37: Reconstruction of enhancer-target gene regulatory network integrating three-dimensional chromatin architecture
COSI: RegSys COSI
  • Elisa Salviato, IFOM, the FIRC Institute of Molecular Oncology, Italy
  • Judith Mary Hariprakash, IFOM, the FIRC Institute of Molecular Oncology, Italy
  • Francesco Ferrari, IFOM, the FIRC Institute of Molecular Oncology, Italy

Short Abstract: BACKGROUND: Enhancer-Target Gene (ETG) pairing is an open challenge in functional genomics owing to lack of an exhaustive reference list of enhancers and their location distant from the target gene. The solutions proposed so far do not effectively account for the most updated knowledge of chromatin 3D architecture. METHODS: We defined active enhancers leveraging chromatin marks data across a large panel of human cell types. Target genes were predicted using orthogonal linear combinations of active chromatin marks enrichment at enhancers and promoters. The multi-scale hierarchy of chromatin 3D organization was taken into account based on Topologically Associating Domains (TADs) at different resolutions across various cell types and the likelihood of intra-chromosomal interactions were converted to a set of weights for ETG pairing. RESULTS: We inferred a comprehensive list of ETG interactions enhancing the accuracy of the prediction incorporating the hierarchical structure of TADs. Benchmarking the algorithm against eQTLs data we obtain a significant association between TADs weights and validated pairs. CONCLUSIONS: With growing evidences on the key role of enhancers on the broader gene regulatory network our approach will provide a valuable framework for elucidating the functional relationship of regulatory elements in the context of chromatin three-dimensional architecture.

E-38: HiCExplorer 3: A toolbox for Hi-C data analysis
COSI: RegSys COSI
  • Joachim Wolff, Albert-Ludwigs-Universitaet Freiburg, Germany
  • Leily Rabbani, Max Planck Institute of Immunobiology and Epigenetics Freiburg, Germany
  • Gautier Richard, Max Planck Institute of Immunobiology and Epigenetics Freiburg, Germany
  • Thomas Manke, Max Planck Institute of Immunobiology and Epigenetics Freiburg, Germany
  • Asifa Akhtar, Max Planck Institute of Immunobiology and Epigenetics Freiburg, Germany
  • Fidel Ramirez, Target Discovery Research Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riß, Germany
  • Björn Grüning, Albert-Ludwigs-Universitaet Freiburg, Germany
  • Rolf Backofen, Albert-Ludwigs-University Freiburg, Germany

Short Abstract: HiCExplorer is a toolbox to analyse and explore the 3D structure of the DNA based on Hi-C (high-throughput sequencing chromatin conformation capture) data. Tools for all steps of an analysis of Hi-C data like creation of Hi-C interaction matrices, quality assessment, correction of Hi-C interaction matrices and identification of A/B compartments, chromatin loops and topological associated domains (TADs) are provided. Users can create publication ready plots of the two dimensional Hi-C interaction matrix and add A/B compartments and chromatin loops information. Additionally, pyGenomeTracks can be used to plot TADs on a selected genomic locus, along with additional information like gene tracks or ChIP-seq signals. Moreover, HiCExplorer supports the analysis of cHi-C data for promoter-enhancer interactions based on a background model. With the Galaxy HiCExplorer web server we provide computational resources and users with little bioinformatic background can perform every step of the Hi-C analysis in a simple to use web browser user interface.

E-39: A framework for exhaustive modelling of logical genetic interactions using Petri nets
COSI: RegSys COSI
  • Olga Ivanova, Vrije Universiteit Amsterdam, Netherlands
  • Annika Jacobsen, Leiden University Medical Center, Netherlands
  • Saman Amini, Princess Máxima Center for Pediatric Oncology, Netherlands
  • Jaap Heringa, Vrije Universiteit Amsterdam, Netherlands
  • Patrick Kemmeren, Princess Maxima Center for Pediatric Oncology, Netherlands
  • K. Anton Feenstra, Vrije Universiteit Amsterdam, Netherlands

Short Abstract: Motivation: Genetic interaction (GI) patterns are characterized by the phenotypes of interacting single and double mutated gene pairs. Uncovering the regulatory mechanisms of GIs would provide a better understanding of their role in biological processes, diseases, and drug response. For a defined set of factors computational analyses can provide insights into the underpinning mechanisms of GIs as genomic regulatory elements. Results: In this study, we present a framework for exhaustive modelling of GI patterns using Petri nets (PN). Four-node models were defined with restrictions and generated on three levels to enable an exhaustive approach. Simulations suggest ~5 million models of GIs for further analysis. We propose putative mechanisms for the GI patterns, inversion and suppression, by using generalized topologies and frequent edges and edge weights. Our results demonstrate that exhaustive PN modelling can be applied to reason about mechanisms of GIs when only the phenotypes of the interacting gene pairs are known. The framework can be applied to other GI or genetic regulatory datasets. Further analysis of the network models generated will help chart plausible evolutionary paths of such regulatory modules.

E-40: Genetic regulation across 1,300 human induced pluripotent stem cell lines
COSI: RegSys COSI
  • Marc Jan Bonder, EMBL, Germany
  • Craig Smail, Stanford, United States
  • Stephen Montgomery, Stanford, United States
  • Oliver Stegle, DKFZ, Germany
  • Kelly Frazer, University of San Diego, United States

Short Abstract: Human induced pluripotent stem cells (iPSC) are a powerful system to assay the molecular impact of genetic variants. Recent studies have shown that iPSCs provide a unique system to study genetic effects, uncovering effects linked to diseases in unique cellular and developmental contexts. To facilitate this, we have assembled the largest meta-cohort of iPSC (the i2QTL-study). We collected data on 1,001 individuals with both genotype and RNA-sequencing data. We identified eQTLs for 68% of the expressed protein coding genes (FDR<5%). Next to gene-level effects we mapped cis-QTLs for transcript, exon, splicing and 3’UTR levels. In total 22,223 genes are found to be under genetic regulation. By overlapping our QTLs with GWAS signals, we find that there is strong co-localization with over 550 traits. For instance for coronary artery disease we observe GWAS loci linked to each of the quantified expression-levels. Next to cis-QTLs we also quantified trans-eQTLs, in total 1,077 trans-eQTLs, linked to 199 trans-eGenes were identified (FDR<10%). Most of the trans-eQTLs are linked to cis-eQTL variants, but we also found trans-eQTLs from GWAS variants. The i2QTL-study is unique in its large sample size and in its extensive data, enabling the first comprehensive map of regulatory variants in iPSC.

E-41: Characterizing a focused landscape of Familial Acute Respiratory Distress Syndrome
COSI: RegSys COSI
  • Inimary Toby, University of Dallas, United States
  • Joanna Floros, The Pennsylvania State University College of Medicine, United States
  • Nithyananda Thorenoor, The Pennsylvania State University College of Medicine, United States

Short Abstract: Acute respiratory distress syndrome (ARDS) affects approximately 190,600 patients per year in the United States, with mortality up to 45%. Despite improvements in intensive care during the last fifteen years, ARDS is still a morbid and life-threatening condition. Presently, ARDS diagnosis relies on comprehensive indicators owing to a lack of simple and reliable criteria. In fact, ARDS diagnosis has seen limited progress since its initial description in 1967 and management is still based largely on supportive care, with no established therapies targeted at the primary disease processes. Thus, there is a pressing need for methods of early detection and treatment. Previous genomic studies in ARDS have focused on characterization of subjects unrelated to each other. The strategy for this study was to focus on a structured landscape in order to elucidate underlying inheritance patterns of “private variants”. The 3,516 SNPs identified demonstrate that there are important biological pathways that distinguish ARDS cases from one another. The data also show a coordinated effort amongst signaling processes that underlie the pathogenesis of ARDS. Results from validation of variants could be leveraged for clinical monitoring in cases for which family history indicates the presence of a genetic inheritance pattern of ARDS.

E-42: scds: Computational Annotation of Doublets in Single Cell RNA sequencing Data
COSI: RegSys COSI
  • Abha Bais, University of Pittsburgh, United States
  • Dennis Kostka, University of Pittsburegh, United States

Short Abstract: Single cell RNA-sequencing (scRNA-seq) technologies employing micro-fluidics in combination with unique molecular identifiers enable the study of thousands of cells per experiment. However, these methods sometimes wrongly consider two or more cells as a single cell, yielding doublets. These errors can severely confound interpretation of downstream results and hence computational strategies are needed for efficient doublet detection. We present single cell doublet scoring (scds), an approach for doublet identification in scRNA-seq data encompassing two new and complementary methods: Co-expression based doublet scoring (cxds), binary classification based doublet scoring (bcds), and their combination (hybrid). The co-expression based approach, cxds, utilizes binarized gene expression data and employs a binomial model for the co-expression of pairs of genes. bcds, on the other hand, uses a binary classification approach to discriminate artificial doublets from the original data. Performance evaluation using four data sets and multiple existing approaches demonstrates that our proposed methods perform well and that no approach dominates all others. We also find appreciable differences between doublet detection methods across data sets and believe there is room for improvement as more data becomes available. In the meanwhile, scds presents a scalable, competitive approach that enables doublet annotations in thousands of cells in seconds.

E-43: Predicting individual variation in chromatin architecture from RNA-Seq data
COSI: RegSys COSI
  • Lucas van Duin, University of Copenhagen, Denmark
  • Sarah Rennie, University of Copenhagen, Denmark
  • Robin Andersson, The Bioinformatics Centre, University of Copenhagen, Denmark

Short Abstract: The amount of transcription at genomic loci may be determined by two classes of mechanisms: Those that act on single genes, and those that act on multiple genes within a genomic neighbourhood, influenced by local three-dimensional chromatin architectures. How much of the transcriptional output is caused by each of these two classes of mechanisms is hard to determine. Previously, we have inferred the magnitude of these two components along the genome, using a Bayesian hierarchical model on transcriptional data. On CAGE (Cap Analysis of Gene Expression) data, this transcriptional decomposition approach accurately revealed chromatin compartments and boundaries of active topologically associating domains (Rennie et al., 2018). Our decomposition approach can also successfully be used on RNA-Seq data, shown by the high similarity to the components derived from CAGE data. Here, we applied transcriptional decomposition to the GEUVADIS data set (Lappalainen et al., 2013), to determine if differences in the PD component can be used to infer individual variation in chromatin architecture. We identify regions showing distinct component differences across different populations and link them to genetic variants. Overall, our work provides yet unobserved insights into the link between transcription, three-dimensional architectures and genetic variation across large groups of individuals.

E-44: A dynamic landscape of enhancer clusters and core regulatory circuitries in mouse development
COSI: RegSys COSI
  • Anthony Mathelier, NCMM, University of Oslo, Norway
  • Aziz Khan, NCMM, University of Oslo, Norway

Short Abstract: Transcriptional enhancers form clusters (also known as super-enhancers) that can activate tissue-specific gene expression and play an important role in the control of cell identity, development, and disease. Recent studies have characterized these enhancers and core regulatory circuitries ​in several cell types and tissues. However, their global contribution in the context of development in vivo remains unclear. Here, we used public ChIP-seq data for eight histone modifications with chromatin accessibility (ATAC-seq) data to characterize super-enhancers at eight mouse embryonic developmental stages across 12 different tissues to uncover their genome-wide dynamic landscape during development. We further studied the dynamics of super-enhancers and their constituents during development and how that affects gene expression using RNA-seq data. We show that these clusters of enhancers are development stage-specific and importantly we pinpoint development stage-specific core regulatory circuitries which dominate the control of the gene expression programs. We envision that this study will contribute to understanding the dynamic role of individual enhancers within super-enhancers in the context of mouse development in vivo.

E-45: Disentangling transcription factor binding site complexity
COSI: RegSys COSI
  • Ralf Eggeling, University of Tübingen, Germany

Short Abstract: The binding motifs of many transcription factors (TFs) comprise a higher degree of complexity than a single position weight matrix model permits. Additional complexity is typically taken into account either as intra-motif dependencies via more sophisticated probabilistic models or as heterogeneities via multiple weight matrices. However, both orthogonal approaches have limitations when learning from in vivo data where binding sites of other factors in close proximity can interfere with motif discovery for the protein of interest. In this work, we demonstrate how intra-motif complexity can, purely by analyzing the statistical properties of a given set of TF-binding sites, be distinguished from complexity arising from an intermix with motifs of co-binding TFs or other artifacts. In addition, we study the related question whether intra-motif complexity is represented more effectively by dependencies, heterogeneities or variants in between. Benchmarks demonstrate the effectiveness of both methods for their respective tasks and applications on motif discovery output from recent tools detect and correct many undesirable artifacts. These results further suggest that the prevalence of intra-motif dependencies may have been overestimated in previous studies on in vivo data and should thus be reassessed.

E-46: Chromatin remodeling regulates DNA replication timing
COSI: RegSys COSI
  • Johnathan Whetstine, Harvard University, United States
  • Ruslan Sadreyev, Harvard University, United States

Short Abstract: Although chromatin remodeling and DNA replication have been topics of extensive research, their causal relationship is not fully understood. This relationship has important fundamental and clinical implications. Here we generated and analyzed high-resolution temporal data on DNA replication, dynamics of multiple histone modifications (ChIP-seq), chromatin accessibility (ATAC-seq), and gene expression (RNA-seq) during cell cycle, combined with public Hi-C data in human cells. We found that temporal dynamics of chromatin accessibility and histone modifications are strongly associated with patterns of replication at a given genomic region during cell cycle (regression R=0.94). We then perturbed chromatin states by depleting or overexpressing chromatin modifying proteins, e.g. histone demethylase KDM4A, and analyzed effects of these perturbations on replication timing. KDM4A overexpression affected replication across as much as 15% of the genome. The coordinated changes in the levels of multiple histone modifications caused by KDM4A overexpression were associated with specific predictable shifts to earlier or later replication at a given genomic region. Unexpectedly, these shifts were not quantitatively associated with the factors previously suggested to affect replication: differential gene expression, original TAD structure, or original replication timing. These results suggest that chromatin modifying proteins can function as important regulators of DNA replication during cell cycle.

E-47: Evolutionary analysis of cis-regulatory elements, chromatin changes and gene expression in the central nervous system of primates
COSI: RegSys COSI
  • Abusaid Shaymardanov, National Research University Higher School of Economics, Russia
  • Dmitry Svetlichnyy, National Research University Higher School of Economics, Russia

Short Abstract: Investigating gene expression evolution in brain tissue together with the divergence of genomic regulatory elements in a phylogenetic context is important to decipher genes that might shape phenotypic differences. In this study, we have investigated evolution of gene expression level in the context of sequence changes in the cis-regulatory elements. To this end, we performed a combined analysis of ChIP-seq [1] and RNA-seq [2] data for eight brain regions of primates. Using RNA-seq data we identified genes whose expression was evolving through drift, stabilizing selection, or a lineage-specific shift. Next, we analyzed divergence of cis-regulatory signals on sequence level using the strength of the binding site for transcription factors as a quantitative trait and employed Ornstein–Uhlenbeck process. We linked obtained predictions with experimental data reflecting the presence of active chromatin mark (H3K27Ac) and identified both genomic regions and transcription factors (TFs) operating in these loci. Overall, we identified genes and predicted candidate regulatory TFs whose binding demonstrate a lineage-specific shift.

E-48: Transcription factor binding to mismatched DNA and its potential role in mutagenesis
COSI: RegSys COSI
  • Harshit Sahay, Duke University, United States
  • Ariel Afek, Duke University, United States
  • Honglue Shi, Duke University, United States
  • Atul Rangadurai, Duke University, United States
  • Hashim Al-Hashimi, Duke University, United States
  • Raluca Gordan, Duke University, United States

Short Abstract: DNA sequence and shape are known to be important for transcription factor (TF)-DNA recognition. Still, some fundamental aspects of this recognition are poorly understood. Structures of TF-bound DNA show significant distortions from B-form DNA, and the implications of these distortions on specificity have not been characterized. Additionally, while TFs bound to lesioned DNA are believed to act as roadblocks for repair, the effects of these lesions on TF binding are unknown. Here, we focus on DNA mismatches, a type of lesion frequently generated in the cell. We present the first high-throughput assay to measure the effects of mismatches on TF binding. Mismatches can cause significant distortions in B-DNA, and thus present a way to characterize the effects of conformational penalties on TF binding. Our results show that mismatches have a widespread impact on binding across many TF families, which is not explained by sequence alone. We find that mismatches that increase TF binding generally exhibit geometries similar to distorted base-pairs in TF-bound structures. TFs can compete with repair enzymes for these mismatched sequences, eventually causing mutations. Focusing on c-Myc and a T-G mismatch that increases binding by >30-fold, we show that the resulting mutation is highly enriched in cancer genomes.

E-49: The effects of UV damage on transcription factor binding to DNA
COSI: RegSys COSI
  • Zachery Mielko, Duke University, United States
  • Ariel Afek, Duke University, United States
  • Raluca Gordan, Duke University, United States

Short Abstract: DNA damage is enriched in transcription factor (TF) binding sites due to modulation of the rate of damage and the rate of repair. The inhibition of DNA repair is thought to be a result of TFs competing with repair enzymes. If there is competition, it implies that TFs bind to damaged DNA. However, it has not been established whether TFs can interact with damaged DNA, or more generally how damages influence TF specificity. We collected high-throughput data for 11 TFs using a novel methodology, UV PBM, which measures how UV DNA damage affects the binding specificity of TF proteins. Our method gives highly reproducible results, with a squared Pearson correlation of 0.98 for damage measurements and a median correlation of 0.86 for TF binding, across all proteins tested. We found that damage can increase or decrease TF binding, and the effects are protein specific and dependent on the position of damage relative to the binding site. Our new binding data will be used to infer if sites that are still strongly bound by TFs after UV damage exhibit levels of decreased repair and increased mutations.

E-50: Identifying cis-regulatory regions for targeted therapeutic expression
COSI: RegSys COSI
  • Oriol Fornes, The University of British Columbia, Canada
  • Tamar V. Av-Shalom, The University of British Columbia, Canada
  • Rachelle A. Farkas, The University of British Columbia, Canada
  • Andrea J. Korecki, The University of British Columbia, Canada
  • Michelle Kang, The University of British Columbia, Canada
  • Philip A. Richmond, The University of British Columbia, Canada
  • David J. Arenillas, The University of British Columbia, Canada
  • Siu Ling Lam, The University of British Columbia, Canada
  • Elizabeth M. Simpson, The University of British Columbia, Canada
  • Wyeth W. Wasserman, The University of British Columbia, Canada

Short Abstract: Expression of the delivered gene in cells other than those clinically relevant (i.e. off-target) is undesirable and a major concern in gene therapy applications. Knowledge of cis-regulatory regions (CRRs) can be used to design cis-regulatory sequences (CRSs) that restrict the expression of the delivered gene to the target cells. OnTarget automates the design of selective CRSs. It incorporates three components: 1) an underlying data repository, GUD (Genomic Universal Database), linking thousands of public regulatory genomics datasets from hundreds of human samples; 2) a module to identify CRRs potentially controlling the expression of genes in sample-specific contexts; and 3) a module to link CRRs to genes (under development). Identification of CRRs is done as follows: 1) select samples relevant to the target cells; 2) identify one gene differentially expressed in these samples; and 3) predict the CRRs of that gene based on a combination of regulatory genomic features and conservation. Identified CRRs are linked to the gene based on correlation of transcription factor binding. The CRS is achieved by assembling an optimal subset of the CRRs. As an example, we show the application of OnTarget to design CRSs targeting specific cells within the brain and eye.

E-51: Characteristic of human putative transcriptional target genes and discovery of biased orientation of DNA motifs affecting transcription of genes
COSI: RegSys COSI
  • Naoki Osato, Osaka University, Japan

Short Abstract: To find DNA binding motif sequences of TF such as CTCF affecting the interactions between enhancers and promoters of genes and their expression, here, I predicted human transcriptional target genes of TF bound in open chromatin regions in enhancers and promoters in monocytes and other cell types. Transcriptional target genes were predicted based on EPA. EPA was shortened at the genomic locations of forward-reverse (FR) or reverse-forward (RF) orientation of DNA binding motifs of a TF, which was characteristic of CTCF binding sites at chromatin interaction anchors. The expression level of target genes of a TF predicted based on EPA was compared with target genes of the same TF predicted from only promoters, in order to estimate the enhancer activity of the TF. By comparing the expression levels according to the criteria of EPA, I found that about two hundred of biased (FR or RF) orientation of DNA motifs affected the expression level of putative transcriptional target genes significantly in monocytes of four people in common, and also in T cells. Moreover, enhancer-promoter interactions (EPI) predicted using FR or RF orientation of DNA motifs were overlapped with chromatin interaction data (HiChIP and Hi-C) more than the other criteria of EPA.

E-52: Analyzing Motif Combinations In Enhancer-Promoter Interactions
COSI: RegSys COSI
  • Mu Yang, National Taiwan University, Taiwan
  • Yen-Jen Oyang, National Taiwan University, Taiwan
  • Chen Chien-Yu, National Taiwan University, Taiwan

Short Abstract: Transcriptional activity is controlled by regulatory elements including enhancers and promoters. While enhancer-promoter interactions (EPIs) is a well-known phenomenon, their mechanism is still obscure. A previous study by Whalen et al.(2016) constructed a model to predict EPIs with functional genomics signals, but was subsequently pointed out by Xi et al.(2018) that its high accuracy was sharply diminished after adjusting for the shared features that resulted in a leak of information to the test set. In this study, we used the Hi-C data of six cell lines collected and sorted into positive and negative enhancer-promoter pairs from Whalen et al. To avoid the pitfalls referred to by Xi et al., we looked at the positive and negative enhancers that interact with the same promoter. We are interested in understanding whether motifs or motif combinations play a role in determining the interaction. We scanned the sequences to find motifs, and tested whether positive and negative enhancers have different motif composition. We found some motifs showed statistically significant difference in positive and negative enhancers. However, different promoters involved a different set of motifs in enhancers, and common motifs are infrequent. Our results suggest the motif-related features in enhancers deserve more further studies.

E-53: Discovering Structural Units of Chromosomal Organization with Matrix Factorization and Graph Regularization
COSI: RegSys COSI
  • Da-Inn Lee, University of Wisconsin-Madison, United States
  • Sushmita Roy, University of Wisconsin-Madison, United States

Short Abstract: The three-dimensional (3D) organization of the genome is an important layer of regulation in developmental, disease, and evolutionary processes. Hi-C is a high-throughput chromosome conformation capture (3C) assay used to study the 3D genome by measuring pairwise interactions of genomic loci. Analysis of Hi-C data has shown that the genome is organized into higher-order organizational units such as compartments and topologically associating domains (TADs). Recent comparisons of TAD-finding methods found them to be unstable to different resolutions and sparsity levels of Hi-C data, suggesting the need for more robust methods. We present GRiNCH, a graph-regularized Non-negative Matrix Factorization (NMF) approach to identifying organizational units of chromosomes from Hi-C data. GRiNCH uses graph regularization to encourage neighboring genomic regions to belong to the same low-dimensional space. GRiNCH can recover TAD-like clusters which are significantly enriched in architectural protein binding in the boundaries and are more stable to sparse and low-depth Hi-C datasets than existing methods. Finally, GRiNCH can use the low-dimensional NMF factors to impute missing interaction counts and offer a smoothed Hi-C matrix. Taken together, GRiNCH offers a promising approach to identifying biologically meaningful structural domains of the genome.

E-54: Discovering Gene Regulatory Network Rewiring Associated to Drug Response
COSI: RegSys COSI
  • Charles Blatti, University of Illinois at Urbana-Champaign, United States
  • Sihai Dave Zhao, University of Illinois at Urbana-Champaign, United States
  • Krishna Kalari, Mayo Clinic, United States
  • Winston Tan, Mayo Clinic, United States
  • Richard Weinshilboum, Mayo Clinic, United States
  • Liewei Wang, Mayo Clinic, United States
  • Mikel Heranez, University of Illinois, at Urbana-Champaign, United States

Short Abstract: Background: Many methods for constructing Gene Regulatory Networks (GRNs) from transcriptomics data exist, however, there are fewer that report key changes in the GRNs between groups of samples with distinct phenotypes, (e.g. treatment vs. control). Of these, most focus on identifying differentially expressed genes in connected global pairwise-correlation networks. Methods: Our approach for GRN identification searches for strong evidence for network rewiring between phenotypes where such rewiring has significant impact on the overall co-expression of high-confidence gene modules. The method first uncovers modules whose overall gene expression is explained by the combination of a limited number of regulators, then scores these modules for evidence of significant rewiring between the two phenotype groups, and finally mines these ‘disturbed’ modules for the strongest individual network changes. Results: We applied our approach to expression data from the PROMOTE study at Mayo Clinic of 68 patient samples of metastatic castration-resistant prostate cancer. The patients were separated into responder and non-responder groups after three months of drug treatment. We found gene modules with significant regulatory rewiring between these groups and uncovered many phenotype-specific network edges involving TFs such as FOXD3, ELF5, and SALL2, which have previously been implicated in prostate cancer progression.

E-55: A Bayesian Markov models-based motif discovery tool for predicting motifs in nucleotide sequences and its web server
COSI: RegSys COSI
  • Wanwan Ge, MPI-BPC, Germany
  • Anja Kiesel, Ludwig Maximilian University of Munich, Germany
  • Christian Roth, MPI-BPC, Germany
  • Johannes Soeding, MPI BPC, Germany

Short Abstract: We developed a Bayesian approach for motif discovery using Markov models (BaMMs) which learn nucleotide dependencies within motifs for transcription factor (TF) binding. BaMMs efficiently prevent overfitting by automatically adapting model complexity to the amount of available data which can be still estimated reliably up to order k. We have shown that higher-order Bayesian Markov Models perform substantially better in ROC analyses than position weight matrices (PWMs) [Siebert M., NAR, 2016]. To bring the community the high-order BaMMs with improved quality and to combine various standard analyses, we developed the BaMM webserver. The BaMM webserver offers four tools: (i) de-novo discovery of enriched motifs in a set of nucleotide sequences, (ii) scanning a sequences set with motifs for motif occurrences, (iii) searching with a motif for similar motifs in our BaMM databases, and (iv) browsing and keyword searching in the databases. Our motif database contains motifs for over 1000 TFs, trained from ChIP-seq databases for human, mouse and other organisms. In contrast to servers such as JASPAR and HOCOMOCO, we represent sequence motifs not by PWMs but by BaMMs of order 4. The BaMM server is freely accessible at https://bammmotif.mpibpc.mpg.de [Kiesel A., NAR, 2018].

E-56: Predicting and elucidating transcriptional activity from sequence with Deep Learning
COSI: RegSys COSI
  • Nicolas Alcaraz, The Bioinformatics Centre, University of Copenhagen, Denmark
  • Robin Andersson, The Bioinformatics Centre, University of Copenhagen, Denmark

Short Abstract: The correct function of gene regulatory elements and their interplay are essential for the precisely coordinated transcriptional activities within a cell. GWAS studies have shown that the majority of trait-associated genetic variants do not affect coding sequence, but are highly enriched in regulatory regions (enhancers and gene promoters) and affect transcriptional regulatory elements. We develop and apply state-of-the-art deep learning models to predict transcriptional output solely from DNA sequence. We combine the power of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) and train our models on Capped Analysis of Gene Expression (CAGE) datasets. CAGE has the advantage over other expression profiling methods such as RNA-seq, in that it is possible to determine the exact transcriptional start site (TSS), hence giving a precise view of promoter usage as well as enhancer transcription. Our model outperforms more classical machine learning methods based on predefined k-mer features. Furthermore, we address the “black box” problem of deep learning by using novel methods such as DeepLift to extract the most relevant sequence features for the prediction. We also assess and validate the impact of sequence variants within the relevant features by matching them to known disease variant databases.

E-57: Identifying transcription factors linked to gene expression changes during a key cell state transition in mouse red blood cell development
COSI: RegSys COSI
  • Rob Beagrie, MRC Weatherall Institute of Molecular Medicine, United Kingdom
  • Daniel Hidalgo, UMass Medical School, United States
  • Merav Socolovsky, UMass Medical School, United States
  • Doug Higgs, MRC Weatherall Institute of Molecular Medicine, United Kingdom

Short Abstract: In mouse red blood cell development (erythropoiesis), early progenitors transition to terminal differentiation by passing through a highly specialised cell cycle. During this cell cycle, DNA is replicated faster than in preceding or following cycles due to accelerated replication forks and is globally demethylated. Recent single-cell RNA-seq data show that this cell cycle also coincides with rapid changes in gene expression, likely indicating a cell-state transition. We assayed open chromatin from primary cells at various stages in red blood cell development using ATAC-seq and identify that the entry to terminal differentiation is also associated with major changes in enhancer accessibility. We used SeqGL, a quantitative model using a k-mer feature representation and group lasso regularization to identify transcription factor motifs enriched in enhancer sequences at each stage of red blood cell differentiation. By using the model to predict gene expression changes, we are able to distinguish motifs likely to be associated with changes in chromatin accessibility from those that might drive transcriptional upregulation. These predictions can then be tested by interfering with the levels of transcription factors known to bind to each identified motif.

E-58: Functional analysis of image-based cell profiling from high throughput cancer target screening
COSI: RegSys COSI
  • Euna Jeong, Sookmyung Womens' University, South Korea
  • Sukjoon Yoon, Sookmyung women's university, South Korea

Short Abstract: Current high-throughput technologies enable image-based cell profiling for discovering phenotypic differences resulted from diverse gene knockdown. Morphological features measured with gene perturbation can be used to identify gene functions, cancer-specific phenotypes, and targets of drugs, thus accelerating their clinical applications and patient stratification. This study focused on quantitative image analysis to detect changes in cell physiology such as shape, area, intensity, and texture. About 1200 image-based measures were simultaneously collected from the image-based high content assays using CellProfiler. We optimized those image parameters based on their variation and divided genes into two sets depending on the significance in cell count decrease. Furthermore, we applied functional gene set analysis for the two gene sets and identified a few functional groups according to cellular physiological changes in each set. For the selected gene groups, decision tree-based classification has been done to dissect image signatures based on correlation between image parameters and functional gene group. This system-level analysis of image-based parameters provides new insight to a multitude of applications and better biological interpretation of high content cell-based assays.

E-59: LIQUORICE: Coverage bias correction for the analysis of epigenetic signatures in liquid biopsies
COSI: RegSys COSI
  • Peter Peneder, Children’s Cancer Research Institute, St. Anna Kinderkrebsforschung, 1090 Vienna, Austria, Austria
  • Adrian Stütz, Children’s Cancer Research Institute, St. Anna Kinderkrebsforschung, 1090 Vienna, Austria, Austria
  • Christoph Bock, CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Austria
  • Eleni M. Tomazou, Children’s Cancer Research Institute, St. Anna Kinderkrebsforschung, 1090 Vienna, Austria, Austria

Short Abstract: Liquid biopsies, the analysis of cell-free DNA (cfDNA) or RNA in body liquids, can provide higher-resolution data than traditional biopsies, allowing for earlier detection of relapse and treatment response. The recent discovery of cfDNA fragments as “nucleosome footprints” enables efficient epigenomic interrogation of cfDNA via conventional NGS sequencing. Several studies have demonstrated the value of this approach, for example for the inference of tissue-of-origin, but have not focused on sequencing biases: Whether a fragment is sequenced and properly mapped is influenced by its nucleotide composition, especially by its GC-content and the mappability of the corresponding read-pair. These biases can confound the final quantification of the epigenetic signature strength. Here, we present a user-friendly tool called LIQUORICE, which allows for the correction of coverage biases, taking into account the particular fragment size distribution of cfDNA (multimodal, different between samples). We employed LIQUORICE on a large dataset of liquid biopsies from Ewing sarcoma (EwS) patients, and quantified the epigenetic signature strength of EwS based on the bias-corrected decrease in coverage around EwS-specific DNase I hypersensitive sites. This approach allowed for the accurate distinction between EwS patients and controls, and could provide clinically relevant information about tumor heterogeneity in the future.

E-60: NGseqBasic - a single-command UNIX tool for ATAC-seq, DNaseI-seq, Cut-and-Run, and ChIP-seq data mapping, high-resolution visualisation, and quality control
COSI: RegSys COSI
  • Jelena Telenius, University of Oxford, Weatherall Institute of Molecular Medicine, United Kingdom
  • The Wigwam Consortium, Weatherall Institute of Molecular Medicine, Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom
  • Jim R Hughes, University of Oxford, Weatherall Institute of Molecular Medicine, United Kingdom

Short Abstract: NGseqBasic is an easy-to-use single-command analysis tool for chromatin accessibility (ATAC, DNaseI) and ChIP sequencing data, providing support to also new techniques such as low cell number sequencing and Cut-and-Run. It takes in fastq, fastq.gz or bam files, conducts all quality control, trimming and mapping steps, along with quality control and data processing statistics, and combines all this to a single-click loadable UCSC data hub, with integral statistics html page providing detailed reports from the analysis tools and quality control metrics. The tool is easy to set up, and no installation is needed. A wide variety of parameters are provided to fine-tune the analysis, with optional setting to generate DNase footprint or high resolution ChIP-seq tracks. A tester script is provided to help in the setup, along with a test data set and downloadable example user cases. NGseqBasic has been used in analysis of next generation sequencing data in high-impact publications. The code is actively developed, and accompanied with Git version control and Github code repository. Availability Download, setup and help instructions are available on the NGseqBasic web site http://userweb.molbiol.ox.ac.uk/public/telenius/NGseqBasicManual/external/ Bioconda users can load the tool as library “ngseqbasic”. The source code with Git version control is available in https://github.com/Hughes-Genome-Group/NGseqBasic/releases.

E-61: Machine learning approach in resolving the riddle of absent expression correlation of transcription factors and their targets.
COSI: RegSys COSI
  • Adam Zaborowski, Max Planck Institute of Molecular Plant Physiology, Germany
  • Dirk Walther, Max Planck Institute of Molecular Plant Physiology, Germany

Short Abstract: Transcription factors are one of the major players in the regulation of gene expression. However, contrary to expectation, the average expression correlation coefficient between transcription factors (TFs) and their targets is hardly different from that of transcription factors and non-target genes. Nonetheless, the overall distribution may hide the presence of a subset of genes, which follows a simple regulatory model where expression of a gene can be predicted by the expression of the cognate transcription factor. For Arabidopsis thaliana, we created a set of regulatory pairs based on available DAP-seq experimental data and 5000 expression microarrays. For each pair, a multitude of genomic features, along with data from high throughput experiments were collected. With machine learning algorithms we identified those features, which are associated with highly correlated pairs. Chosen features concern physical properties of TFs such as length of the protein and 5’ UTR, expression variation of gene and TFs, along with other characteristics. Using this information, one can predict, which pairs could be correlated in terms of expression based on genomic features and use this information to more reliably reconstruct gene regulatory networks.

E-62: Statistical inference for overlaps of ChIP-seq experiments
COSI: RegSys COSI
  • Alena van Bömmel, Max Planck Institute for Molecular Genetics, Germany

Short Abstract: Chromatin immunoprecipitation combined with sequencing (ChIP-seq) is a powerful technology to study the interactions between DNA and transcription factors or other chromatin-associated proteins genome wide. With an increasing number of available ChIP-seq data sets conducted with different protein targets, in different cell types and organisms the need for comparative analyses is rising. For this, a statistical inference for the comparison of results of ChIP-seq studies, especially for their overlapping peaks is crucial. In our study, we have analyzed 420 ChIP-seq experiments for more than 40 target proteins available from the ENCODE project. Using simulated data based on the parameters from the ENCODE data set, we provide a background model for comparison of ChIP-seq experiments. Using parametrs for the total number of peaks in both experiments and for the average peak width, we derive statistical distributions for the expected number of overlapping peaks and for the expected overlap width. Thus, using our simulated distributions we are able to assign significance for the number and for the width of overlapping peaks of two ChIP-seq experiments. Due to the general setting of our model, our results can be used for the comparison of a large variety of genomic experiments.

E-63: Polympact: exploring the impact of inherited variants in cancer
COSI: RegSys COSI
  • Samuel Valentini, University of Trento, CIBIO, Italy
  • Francesco Gandolfi, University of Trento, CIBIO, Italy
  • Mattia Carolo, University of Trento, CIBIO, Italy
  • Alessandro Romanel, University of Trento, CIBIO, Italy

Short Abstract: Cancer is an extremely heterogeneous disease arising from complex interactions between inherited and somatic variants that shape cancer genesis and cancer evolution. Although many inherited variants result associated with the risk of developing specific cancer types, the biological functions and mechanisms involved in cancer genesis are still largely unknown. Here we present Polympact, a resource that characterizes ~18 million common inherited variants by integrating ChIPSeq data from ENCODE and RoadMap projects, transcription factor motifs data from several databases and genotype / gene expression data from GTeX and TCGA databases. The resource, which implements a MySQL database, includes data from 729 functional elements, 238 cell-lines, 5,420 Positional Frequency Matrices and >5,000 human samples. Each variant is characterized by combining: aggregated and cell-line specific functional data evidence; the landscape of changes observed in transcription factors binding scores; a set of indexes measuring the potential impact of the variant on the expression of oncogenic signaling pathways in healthy human tissues. Using a web-interface or ad-hoc Python APIs, Polympact allows to query single or multiple variants providing tabular and/or visual reports. The resource represents a useful tool to support the identification and comprehension of the biological effects of inherited variants in cancer.

E-64: Identification of Different Promoter Architectures Using Non-Negative Matrix Factorization
COSI: RegSys COSI
  • Sarvesh Nikumbh, MRC London Institute of Medical Sciences & Imperial College London, United Kingdom
  • Leonie Roos, MRC London Institute of Medical Sciences & Imperial College London, United Kingdom
  • Sebastian Steinhauser, The Francis Crick Institute, United Kingdom
  • Boris Lenhard, MRC London Institute of Medical Sciences & Imperial College London, United Kingdom

Short Abstract: Animal promoters come in different classes that broadly correlate with the type of regulation they are under. These classes of promoters are characterized by enrichment of various features such as the TATA box, initiator element, CpG islands-overlap at the sequence level, and also the histone modifications and nucleosome positioning/organization. Examples like the ‘shifting’ promoters in zebrafish where the same promoter sequence encodes a different chromatin architecture before and after the maternal to zygotic transition during embryonic development are intriguing. Thus, a comprehensive characterization of these distinct promoter architectures can help toward better understanding of the diverse mechanisms for gene transcription regulation. We present, \textit{promArch}, an approach using non-negative matrix factorization followed by clustering to identify different promoter architectures with their characteristic features. Computational experiments on synthetic and various real promoter sequences demonstrate the efficacy of \textit{promArch}: it can detect \emph{de novo} features and simultaneously identify the complex interactions of different sequence features together with their positional specificities. \textit{promArch} is also about an order of magnitude faster in comparison to other state-of-the-art approaches.

E-65: Cartography of DNA replication initiation zones in 12 cell lines reveals replication plasticity in late replicating regions
COSI: RegSys COSI
  • Hadi Kabalane, Laboratoire de Physique de l'ENS de Lyon, UMR CNRS 5672, Lyon, France
  • Xia Wu, Institut de Biologie de l'Ecole Normale Supérieure (IBENS),CNRS UMR8197, Inserm U1024, Paris, France
  • Olivier Hyrien, Institut de Biologie de l'Ecole Normale Supérieure (IBENS),CNRS UMR8197, Inserm U1024, Paris, France
  • Benjamin Audit, Laboratoire de Physique de l'ENS de Lyon, UMR CNRS 5672, Lyon, France

Short Abstract: For several years, DNA replication program was mapped in diverse cell lines to characterize the regions susceptible to undergo replicative program change concomitant to cell differentiation or pathological instabilities. Using the Replication Fork Directionality (RFD) profiles determined by OK-seq in 12 human cell lines (Petryk, 2016; Wu, 2018), we systematically determined the locations of the most active replication initiation zones (IZ). Number of IZ with an efficiency above 1%RFD per kb is ∼5000 in 10 cell lines. IZ density significantly decreases with Mean Replication Timing (MRT), density in early-replicating regions (MRT<0.2) being 5 times larger than in late regions (MRT>0.8). Taking into consideration all 12 RFD profiles, we computed the mean of the correlation value (MCR) between the average RFD profile and each of the 12 RFD profiles in 100 kb windows. On average, gene relative transcriptional changes are low (resp. high) in regions of conserved (MCR>0.52) (resp. variable, MCR<0.52) replication program, suggesting a global correlation between regions with transcription and replication regulation. However, IZ locations in late-replicating gene-poor regions are significantly more variable than in gene-rich early regions. Hence, these data also question the existence of a systematic link between IZ efficiency variations and transcriptional changes during cell differentiation.

E-66: Benchmarking functional genomics tools on simulated and real single-cell RNA-seq data
COSI: RegSys COSI
  • Christian Holland, Institute of Computational Biomedicine, Heidelberg University, Germany
  • Bence Szalai, Semmelweis University, Faculty of Medicine, Department of Physiology, Hungary
  • Julio Saez-Rodriguez, Institute of Computational Biomedicine, Heidelberg University, Germany

Short Abstract: With the emergence of high-throughput transcriptome profiling many tools have been developed to extract functional and mechanistic insight from bulk expression data. With the advent of single-cell RNA sequencing it is possible to do such an analysis for single cells. However, this technology has its own limitations and characteristics such as drop-out events, low library sizes and a comparative large number of samples/cells. In this study we perform benchmark studies on in silico and in vitro data to explore whether functional genomic tools developed for bulk data can be applied on single-cell RNA-seq data. We focus on the tools PROGENy and DoRothEA that estimate pathway and transcription factor (TF) activities, respectively. For the in silico study we simulate single cells from TF/pathway perturbation bulk RNA-seq experiments. Our simulation strategy guaranties that the information of the original perturbation is preserved while the characteristics of single-cell RNA-seq data are introduced. Benchmarking the performance of both tools on the simulated single cells reveal a comparable performance to the original bulk data. In addition we conduct an in vitro study on selected real single-cell RNA-seq datasets. We show that functional characterization of these data sets is in agreement with existing knowledge.

E-67: Maternal dietary glycemic index and load is associated with changes in the infants’ methylome
COSI: RegSys COSI
  • Negusse Kitaba, Biological Sciences, University of Southampton, United Kingdom
  • Antoun E, Biological Sciences, University of Southampton, United Kingdom
  • Titcombe P, MRC Lifecourse Epidemiology Unit, University of Southampton, United Kingdom
  • Kathryn Dalrymple, Department of Women and Children's Health, King's College London, United Kingdom
  • Flynn A, Department of Women and Children's Health, King's College London, United Kingdom
  • Seed Paul, Department of Women and Children's Health, King's College London, United Kingdom
  • Murray R, School of Human Health and Development, University of Southampton, United Kingdom
  • Emma Garratt, School of Human Health and Development, University of Southampton, United Kingdom
  • Barton S.J, MRC Lifecourse Epidemiology Unit, University of Southampton, United Kingdom
  • White Sara, Department of Women and Children's Health, King's College London, United Kingdom
  • Burdge Gc, School of Human Health and Development, University of Southampton, United Kingdom
  • Poston Lucilla, Department of Women and Children's Health, King's College London, United Kingdom
  • Godfrey K, MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, UK, United Kingdom
  • Lillycrop K, Biological Sciences, University of Southampton, United Kingdom

Short Abstract: Aim: There is substantial evidence from experimental models that maternal diet can induce persistent epigenetic and phenotypic changes in the offspring. Here, we explore the influence of maternal dietary glycemic index (GI) and load (GL) on methylome of the child at birth. Methods: Cord blood DNA methylation was profiled using the Infinium Human Methylation array (850K) in infants (n=356) born to mothers from the UK Pregnancies Better Eating and Activity Trial (UPBEAT) of a dietary and physical activity intervention, which reduced maternal glycemic load and infant adiposity. Analysis was carried out using beta mixture quantile normalization (BMIQ), and a multiple regression model to identify differentially methylated CpGs (dmCpGs)(FDR≤0.05) associated with maternal GI and GL. GI and GL were assessed by food frequency questionnaires at 36 weeks’ of gestation. For this analysis, UPBEAT was considered as a cohort study with the intervention as a covariate. Results: Thirteen dmCpGs were associated with maternal dietary GI, and 21 dmCpGs with maternal GL. The dmCpGs were enriched for pathways related to glycolysis, ATP metabolism, lipid translocation, cholesterol biosynthesis. Conclusions: This study supports the paradigm that maternal nutritional intake can modulate the methylome of the infant with potential consequences for later offspring health.

E-68: Integrative analysis of epigenetics data identifies gene-specific regulatory elements
COSI: RegSys COSI
  • Florian Schmidt, Saarland University, Germany
  • Alexander Marx, Max Plank Institue for Informatics and Saarland University, Germany
  • Jonathan Goeke, Genome Institute of Singapore, Singapore
  • Jilles Vreeken, Helmholtz Center for Information Security, Germany
  • Marcel Schulz, Goethe University Frankfurt, Germany

Short Abstract: Understanding how epigenetic variation throughout the genome, including non-coding regions, is involved in distal gene expression regulation is an important problem in computational biology. We present STITCHIT, an approach for dissection of epigenetic variation in a gene-specific manner for the detection of regulatory elements (REMs). STITCHIT segments epigenetic signal tracks over many samples and links novel REMs to genes. We show that this approach leads to a more accurate and refined REM detection compared to standard methods on different datasets obtained from the International Human Epigenomics Consortium. The reliability and quality of STITCHIT predictions is proven by several validation experiments. For instance, we show that STITCHIT REMs are enriched for GWAS hits, and eQTLs compared to other tested methods and against random controls. Further, STITCHIT show a high overlap with a curated enhancer database and novel predictions are enriched for the enhancer histone mark H3K27ac. A novel application illustrating the usefulness of STITCHIT is an analysis of genome-wide CRISPR-Cas9 screen to prioritize novel doxorubicin-resistance genes and their associated non-coding REMs. STITCHIT is a freely available, ease-to-use software tool that paves the way for a simple analysis of big epigenomic data sets and enables the exploratory analysis of gene-specific regulatory landscapes.

E-69: EpiAtlas, the repository of single-cell human epigenetic profiles
COSI: RegSys COSI
  • Laurynas Kalesinskas, Stanford University, United States
  • Michele Donato, Stanford University, United States
  • Steven Schaffert, Stanford University, United States
  • Rohit Vashisht, Stanford University, United States
  • Ananth Ganesan, Stanford University, United States
  • Alex Kuo, Stanford University, United States
  • Peggie Cheung, Stanford University, United States
  • Mai Dvorak, Stanford University, United States
  • Sarah Chang, Stanford University, United States
  • Mariko Foecke, Stanford University, United States
  • Paul Utz, Stanford University, United States
  • Purvesh Khatri, Stanford University, United States

Short Abstract: Post-translational histone modifications of chromatin are a key regulator of many biological processes. However, their role in disease and in immune cells has been largely unexplored, due to lack of data at single-cell resolution. Recently, with the aid of the high-throughput Epigenetics profiling by Time-Of-Flight method (EpiTOF), we characterized levels of histone modifications in hundreds of samples at a single-cell resolution, creating EpiAtlas, the first repository of single-cell epigenetic data. To date, EpiAtlas contains single-cell histone modification measurements for more than 115 million cells from 469 human samples (215 healthy samples). Each sample was manually annotated with clinical information such as disease, sex and age. In our analyses we have been able to identify novel relationships between chromatin modifications, as well as ones previously described in literature, by comparing epigenetic profiles across healthy immune cells (73 million cells). Further, leveraging the single-cell resolution of EpiTOF, we have been able to uncover cell-type specific relationships between histone modifications, pointing to the presence of cell-type specific epigenetic regulatory networks. With more than 4 billion data points, EpiAtlas represents an invaluable resource for understanding the role of epigenetics in disease and developing methods for the analysis of epigenetic data.

E-70: Prediction of transcription factor binding by means of structural modeling
COSI: RegSys COSI
  • Alberto Meseguer, Universitat Pompeu Fabra, Spain
  • Filip Arman, Universitat Pompeu Fabra, Spain
  • Oriol Fornes, The University of British Columbia, Canada
  • Baldo Oliva, GRIB (IMIM-UPF), Spain

Short Abstract: Knowledge of transcription factor (TF) binding sites, the locations at which TFs bind to DNA in the genome, is key to understanding how genes are regulated. Yet, the binding preferences of most eukaryotic TFs remain unknown. In this scenario, the development of computational tools as a complement to experimental procedures is fundamental. Here, we introduce ModCRE, a homology modeling-based approach that combines structural information and protein binding microarray (PBM) data to predict the binding preferences of TFs and model TF-DNA interactions. ModCRE models TF-DNA complex structures for all PBM sequences with a known template binding and generates family-specific statistic potentials. We use them to score the TF-DNA binding for any TF sequence of a family with known structure and obtain a DNA position-specific weight matrix (PWM) profile. ModCRE is applied to the following tasks: 1) discriminate bound from unbound PBM 8-mers using the binding scores; 2) predict JASPAR profiles from experimental methods (excluding PBMs); 3) predict the PWM of non-redundant TFs and compare them to the profiles of the closest non-redundant homologs, proving the ability to find the correct PWMs under low percentages of sequence identity; and 4) model the structure of the INF-β human enhanceosome.